An AUV Target-Tracking Method Combining Imitation Learning and Deep Reinforcement Learning
نویسندگان
چکیده
This study aims to solve the problem of sparse reward and local convergence when using a reinforcement learning algorithm as controller an AUV. Based on generative adversarial imitation (GAIL) combined with multi-agent, multi-agent GAIL (MAG) is proposed. The enables AUV directly learn from expert demonstrations, overcoming difficulty slow initial training network. Parallel multi-agents reduces high correlation between samples avoid convergence. In addition, function designed help training. Finally, results show that in unity simulation platform test, proposed has strong optimal decision-making ability tracking process.
منابع مشابه
Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning
In this paper, we propose to combine imitation and reinforcement learning via the idea of reward shaping using an oracle. We study the effectiveness of the nearoptimal cost-to-go oracle on the planning horizon and demonstrate that the costto-go oracle shortens the learner’s planning horizon as function of its accuracy: a globally optimal oracle can shorten the planning horizon to one, leading t...
متن کاملTruncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning
In this paper, we propose to combine imitation and reinforcement learning via the idea of reward shaping using an oracle. We study the effectiveness of the nearoptimal cost-to-go oracle on the planning horizon and demonstrate that the costto-go oracle shortens the learner’s planning horizon as function of its accuracy: a globally optimal oracle can shorten the planning horizon to one, leading t...
متن کاملRainbow: Combining Improvements in Deep Reinforcement Learning
The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari ...
متن کاملImitation in Reinforcement Learning
The promise of imitation is to facilitate learning by allowing the learner to observe a teacher in action. Ideally this will lead to faster learning when the expert knows an optimal policy. Imitating a suboptimal teacher may slow learning, but it should not prevent the student from surpassing the teacher’s performance in the long run. Several researchers have looked at imitation in the context ...
متن کاملHierarchical Imitation and Reinforcement Learning
We study the problem of learning policies over long time horizons. We present a framework that leverages and integrates two key concepts. First, we utilize hierarchical policy classes that enable planning over different time scales, i.e., the high level planner proposes a sequence of subgoals for the low level planner to achieve. Second, we utilize expert demonstrations within the hierarchical ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Marine Science and Engineering
سال: 2022
ISSN: ['2077-1312']
DOI: https://doi.org/10.3390/jmse10030383